spec : fix draft model checkpoints by ggerganov · Pull Request #22521 · ggml-org/llama.cpp

ggerganov · 2026-04-29T14:44:13Z

Overview

cont #19493

Improve the logic for when to create and restore the draft model checkpoints. The old logic was discarding the checkpoint on every new completion request resulting in long draft model recompute during large agentic session.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

* 'master' of github.com:tekintian/llama.cpp: (659 commits) ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_ID (ggml-org#22464) Update llama-mmap to use ftello/fseeko (ggml-org#22497) common : check for null getpwuid in hf-cache (ggml-org#22550) vulkan: add get/set tensor 2d functions (ggml-org#22514) spec: fix argument typo (ggml-org#22552) ci : bump ty to 0.0.33 (ggml-org#22535) vendor : update cpp-httplib to 0.43.2 (ggml-org#22548) CUDA: fix tile FA kernel on Pascal (ggml-org#22541) scripts : add wc2wt.sh - create worktree from current HEAD (ggml-org#22513) add fast matmul iquants (ggml-org#22504) spec : fix draft model checkpoints (ggml-org#22521) spec : fix vocab compat checks in spec example (ggml-org#22426) common : do not pass prompt tokens to reasoning budget sampler (ggml-org#22488) hexagon: make vmem and buffer-size configurable (ggml-org#22487) CUDA: fuse SSM_CONV + ADD(bias) + SILU (ggml-org#22478) spec : disacard last drafted token with low prob (ggml-org#22506) sync : ggml ggml : bump version to 0.10.1 (ggml/1469) webui: fix slow mic stop and WAV encode (ggml-org#22480) ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault (ggml-org#22293) ... # Conflicts: # .gitignore

* spec : fix draft model checkpoints * cont : clean-up * cont : gate the ngram-mod reset warning behind verbose flag

ggerganov added 2 commits April 29, 2026 17:41

spec : fix draft model checkpoints

f1c2570

cont : clean-up

dd69245

ggerganov marked this pull request as ready for review April 29, 2026 17:50

ggerganov requested review from a team as code owners April 29, 2026 17:50

github-actions Bot added examples server labels Apr 29, 2026

cont : gate the ngram-mod reset warning behind verbose flag

c3a0f5f

ggerganov merged commit 80afa33 into master Apr 30, 2026
45 of 46 checks passed

ggerganov deleted the gg/spec-fix-draft-checkpoints branch April 30, 2026 05:32

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

spec : fix draft model checkpoints (ggml-org#22521)

7aa1ee1

* spec : fix draft model checkpoints * cont : clean-up * cont : gate the ngram-mod reset warning behind verbose flag

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec : fix draft model checkpoints#22521

spec : fix draft model checkpoints#22521
ggerganov merged 3 commits intomasterfrom
gg/spec-fix-draft-checkpoints

ggerganov commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ggerganov commented Apr 29, 2026

Overview

Requirements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant